Hierarchical cluster analysis of SAGE data for cancer profiling
نویسندگان
چکیده
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the subcellular level. The data, however, is extremely high dimensional, and due to the method of measurement, there are many errors as well as missing values in the data, challenging any clustering algorithm. Therefore, we introduce special pre-processing techniques to reduce these errors and to restore missing data. These techniques are tailored to the process that generates the data, making only very conservative changes. Furthermore, we present a new subspace selection technique to identify a relevant subset of attributes (genes) using the Wilcoxon test. This is a general technique that can be applied to select subspaces for the purpose of clustering whenever some high-level categories of interest are known for the data (such as cancerous and noncancerous). Finally, we discuss the results of the application of the clustering algorithm OPTICS to the SAGE data, before and after our preprocessing steps.
منابع مشابه
Multivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.
The purpose of this work is to promote and facilitate forensic profiling and chemical analysis of illicit drug samples in order to determine their origin, methods of production and transfer through the country. The article is based on the gas chromatography analysis of heroin samples seized from three different locations in Serbia. Chemometric approach with appropriate statistical tools (multip...
متن کاملMultivariate Chemometrics with Regression and Classification Analyses in Heroin Profiling Based on the Chromatographic Data.
The purpose of this work is to promote and facilitate forensic profiling and chemical analysis of illicit drug samples in order to determine their origin, methods of production and transfer through the country. The article is based on the gas chromatography analysis of heroin samples seized from three different locations in Serbia. Chemometric approach with appropriate statistical tools (multip...
متن کاملEfficient Agglomerative clustering Method for Micro Array Data on Breast Cancer Outcome
Analysis of micro arrays presents a number of unique challenges for data mining. The main types of data analysis needed for biomedical applications includeclusteringfinding new biological classes or refining an existing one. We compare the various experimental clustering results of S+ from Insightful, XCluster at Stanford, Eisen’s Cluster, and Rousseau & Kaufman’s Web clusters for single linkag...
متن کاملGraph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کاملA Seriation Approach for Visualization-Driven Discovery of Co-Expression Patterns in Serial Analysis of Gene Expression (SAGE) Data
BACKGROUND Serial Analysis of Gene Expression (SAGE) is a DNA sequencing-based method for large-scale gene expression profiling that provides an alternative to microarray analysis. Most analyses of SAGE data aimed at identifying co-expressed genes have been accomplished using various versions of clustering approaches that often result in a number of false positives. PRINCIPAL FINDINGS Here we...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001